-
Notifications
You must be signed in to change notification settings - Fork 72
Add support for tracing multiple energy groups in Rover #1619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
JustinPrivitera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really fantastic work. Thanks so much for this. I just had a few clarifying comments and questions.
..._baseline_images/tout_rover_xray_blueprint_multiple_groups_intensities_spatial_z0_000048.png
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this zoomed in like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As best I can tell, it seems to be due to using z = 0 in the slice. Using z values close to 0 (0.1, 0.01, or 0.001) have the same legend values as z = 0 but are not zoomed out, so that's my current "fix" for this. Perhaps this is an ascent bug when slicing precisely on a boundary edge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of me wonders if we should open a ticket, but at the same time, it seems questionable in general to slice on a boundary edge. For the VisIt tests, I calculated the midpoints of the energy groups to take the slices. Glad you got this figured out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can defer to @cyrush on whether or not this counts as a bug (context: getting no output when slicing precisely on a boundary). If so, I'd be happy to open an issue.
To your second point, I may as well change the current implementation to use the midpoints also. I don't think it will make a difference in the baselines, but consistency with VisIt is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to follow up, but I did switch to slicing at the midpoints instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think slicing on a boundary is an important case to worry about. I like using midpoints.
|
Failing Windows CI seems to be unrelated to my latest commit. Next thing I need to do is implement those additional tests, particularly to make sure that this works with MPI. As far as my comment about performance, I realized I was going by the total test time which includes time spent creating the additional fields. I need to measure just the ascent execute calls for a proper comparison. |
|
Seems to work out of the box with MPI. Separately, I realized that |
|
And finally, after looking at just the |
src/libs/rover/engine.cpp
Outdated
|
|
||
| if (num_absorption_bins != num_emission_bins) | ||
| { | ||
| ASCENT_LOG_ERROR("Error - Engine::get_num_energy_groups: number of energy groups in absorption field (" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need parallel error checking for a case like this. If this gets called by multiple ranks and one fails then we are in trouble. Although its very unlikely we would have a mismatch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I solved this, but it was a slightly more invasive change than expected, so let me know what you think. Adding parallel error checking in-place (inside of engine.cpp) didn't work because that was causing the MPI test which puts all of the data on rank 0 to fail. It turns out that the ranks that didn't have any data would never reach the MPI_Allreduce, causing rank 0 to hang. I was admittedly pleased to see that ranks with nothing to do weren't bothering to do the setup work required before tracing rays.
My workaround was to move the parallel error check to TypedScheduler in a place that we know all of the ranks will reach and execute. We still check for a mismatched number of fields inside of engine.cpp, but instead of throwing an error right away, detecting a mismatch sets a member variable. Then inside of TypedScheduler, each rank loops over each of its domains and checks to see if any of them had a mismatch. Ranks that had no domains won't have anything to iterate over, and will therefore not have any mismatches.
I don't think we currently have a test that exercises this, but I could come up with one if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always best to make tests to capture all the cases we can, but we aren't always good about it. If you want to add a test you can but I'm not even sure how you would cause this. Num absorption bins would need to be different than num emission bins on only some ranks. I imagine we would hit a Conduit Relay reading error before we ever made it here.
...t_rover_xray_blueprint_multiple_groups_absorption_only_optical_depth_spatial_full_000048.png
Show resolved
Hide resolved
...tests/_baseline_images/tout_rover_xray_blueprint_multiple_groups_intensities_full_000048.png
Show resolved
Hide resolved
JustinPrivitera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor things about parallel error checking and MPI tests.
JustinPrivitera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for getting this in!
Resolves bullets in #1513.
ROVER_ERRORcalls toASCENT_LOG_ERRORcalls, which correctly throw exceptions.Lots still TODO:
Verify that this approach is what we want and mean by multiple energy group support.Seems to be.Add more tests, particularly MPI tests (I haven't tried it with MPI yet).Seems to work fine with MPI.Add validation somewhere to ensure that if we have more than one energy group, that the number of absorption and emission fields are the same.We now check for this in at least 2 places.Each domain is only traced once, but there seems to be some modest overhead compared to doing multiple rover execute calls with one energy group at a time. My guess is that this is due to deinterleaving optical depth and intensity fields at the end (the access pattern is probably not cache friendly). There's probably a smarter way to deal with it.The performance improvement is significant.Some of the baseline images using theThis was apparently due to slicing on a boundary edge with z = 0. Bumping this up to z = 0.001 produces a render with the expected extents.image_topotopology seem to have abnormally large extents, currently not sure why.